Notebook version of demo_load_show_sheet.py

You'll have run jupyter notebook at the command line or via the Windows Anaconda tool.


In [10]:
import pandas as pd

df = pd.read_excel("sheet_1_with_simple_logic.xls")
print(df)


   Feature1  Feature2 DecisionF1F2         TVShow  Decision2 Decision3  \
0       0.6       0.6         True      Hollyoaks          1      True   
1       0.4       0.6        False      hollyoaks          1     False   
2       0.3       0.4        False     Hollyoaks           0     False   
3       0.9       0.8         True      hollyoaks          1      True   
4       0.9       0.8         True     holly-oaks          0     False   
5       0.9       0.8         True  best TV shows          0     False   

                    Comment  
0                       NaN  
1                       NaN  
2  trailing-space on TVShow  
3                       NaN  
4        badly-spelt TVShow  
5                       NaN  

In [11]:
df.head() # this creates a Table view (non-interactive but prettier)
# NOTE! head shows 5 items by default and we have 6 items(!)


Out[11]:
Feature1 Feature2 DecisionF1F2 TVShow Decision2 Decision3 Comment
0 0.6 0.6 True Hollyoaks 1 True NaN
1 0.4 0.6 False hollyoaks 1 False NaN
2 0.3 0.4 False Hollyoaks 0 False trailing-space on TVShow
3 0.9 0.8 True hollyoaks 1 True NaN
4 0.9 0.8 True holly-oaks 0 False badly-spelt TVShow

In [19]:
df.head(10)


Out[19]:
Feature1 Feature2 DecisionF1F2 TVShow Decision2 Decision3 Comment Feature1_Times_2
0 0.6 0.6 True Hollyoaks 1 True NaN 1.2
1 0.4 0.6 False hollyoaks 1 False NaN 0.8
2 0.3 0.4 False Hollyoaks 0 False trailing-space on TVShow 0.6
3 0.9 0.8 True hollyoaks 1 True NaN 1.8
4 0.9 0.8 True holly-oaks 0 False badly-spelt TVShow 1.8
5 0.9 0.8 True best TV shows 0 False NaN 1.8

In [12]:
print("Column names:", df.columns)


Column names: Index(['Feature1', 'Feature2', 'DecisionF1F2', 'TVShow', 'Decision2',
       'Decision3', 'Comment'],
      dtype='object')

In [13]:
print("Information about each row including data types:")
print("(note - type 'object' is catch-all that includes strings)")
df.info()


Information about each row including data types:
(note - type 'object' is catch-all that includes strings)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 7 columns):
Feature1        6 non-null float64
Feature2        6 non-null float64
DecisionF1F2    6 non-null bool
TVShow          6 non-null object
Decision2       6 non-null int64
Decision3       6 non-null bool
Comment         2 non-null object
dtypes: bool(2), float64(2), int64(1), object(2)
memory usage: 300.0+ bytes

In [14]:
print("\nWe can extract a column of data as a Series object:")
print(df['Feature1'])


We can extract a column of data as a Series object:
0    0.6
1    0.4
2    0.3
3    0.9
4    0.9
5    0.9
Name: Feature1, dtype: float64

In [15]:
row = df.ix[0]
print("\nWe can extract a row as a Python dictionary:")
print(row)


We can extract a row as a Python dictionary:
Feature1              0.6
Feature2              0.6
DecisionF1F2         True
TVShow          Hollyoaks
Decision2               1
Decision3            True
Comment               NaN
Name: 0, dtype: object

In [16]:
print("\nRow items, e.g. Feature1={feature1}".format(feature1=row['Feature1']))


Row items, e.g. Feature1=0.6000000000000001

In [17]:
def multiply_feature1_by_2(cell):
    return cell * 2

# we'll apply a function cell-by-cell to each cell in a Series (we pull out the Feature1 Series)
df['Feature1'].apply(multiply_feature1_by_2)
# note this doesn't change the DataFrame, it generates a new separate Series
# and here we just print it and then discard it


Out[17]:
0    1.2
1    0.8
2    0.6
3    1.8
4    1.8
5    1.8
Name: Feature1, dtype: float64

In [20]:
# we can assign the result back to the DataFrame as a new column
new_result = df['Feature1'].apply(multiply_feature1_by_2)
df['Feature1_Times_2'] = new_result
df.head(10)


Out[20]:
Feature1 Feature2 DecisionF1F2 TVShow Decision2 Decision3 Comment Feature1_Times_2
0 0.6 0.6 True Hollyoaks 1 True NaN 1.2
1 0.4 0.6 False hollyoaks 1 False NaN 0.8
2 0.3 0.4 False Hollyoaks 0 False trailing-space on TVShow 0.6
3 0.9 0.8 True hollyoaks 1 True NaN 1.8
4 0.9 0.8 True holly-oaks 0 False badly-spelt TVShow 1.8
5 0.9 0.8 True best TV shows 0 False NaN 1.8

In [ ]: